emphasizing high variance sample
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation.
Reviews: Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
The paper proposes a novel way of sampling (or weighing) data-points during training of neural networks. The idea is, that one would like to sample data-point more often which could be potentially classified well but are hard to learn (in contrast to outliers or wrongly labeled ones). To find' them the authors propose two (four if split into sampling and weighing) schemes: The first one (SGD-*PV) proposes to weigh data-points according to the variance of the predictive probability of the true label plus its confidence interval under the assumption that the prediction probability is Gaussian distributed. The second one (SGD-*TC), as far as I understand, encodes if the probability of choosing the correct label given past prediction probabilities is close to the decision threshold. The statistics needed (means and variances of p) can be computed on-the-fly during a burn-in phase of the optimizer; they can be obtained from a forward pass of the network which is computed anyways.
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
Chang, Haw-Shiuan, Learned-Miller, Erik, McCallum, Andrew
Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of mini-batch SGD, and the proximity of the correct class probability to the decision threshold. Extensive experimental results on six datasets show that our methods reliably improve accuracy in various network architectures, including additional gains on top of other popular training techniques, such as residual learning, momentum, ADAM, batch normalization, dropout, and distillation. Papers published at the Neural Information Processing Systems Conference.